movielens_top40.csv from Canvasauthor_count.csv from CanvasWe will be analysing the MovieLens dataset which contains movie ratings of 58,000 movies by 280,000 users. The entire dataset is too big for us to work with in this lab. It has been preprocessed with only a small subset of the data being considered. If you want to do more exploration yourself, the entire dataset can be downloaded here.
This part of the lab is based on a chapter in an online book by Rafael Irizarry. You can find it here. There are lots of examples in this book to show you how to use R for data science.
This part of the code is for interested students only. You do not need this for the lab.
# Here is the code used to preprocess the data (taken from the Irizarry lab):
library(dplyr)
library(tidyr)
ratings <- read.csv("ml-latest-small/ratings.csv", header = TRUE)
movies <- read.csv("ml-latest-small/movies.csv", header = TRUE)
movielens <- left_join(movies, ratings)
top <- movielens %>%
group_by(movieId) %>%
summarize(n=n(), title = first(title)) %>%
top_n(40, n) %>%
pull(movieId)
x <- movielens %>%
filter(movieId %in% top) %>%
group_by(userId) %>%
filter(n() >= 20) %>%
ungroup() %>%
select(title, userId, rating) %>%
spread(userId, rating)
x <- as.data.frame(x)
rownames(x) <- x$title
x$title <- NULL
colnames(x) <- paste0("user_", colnames(x))
write.table(x, row.names = TRUE, col.names = TRUE, sep = ",", file = "movielens_top40.csv")
Load the data movielens_top40.csv into R. It contains the top 40 movies with the most ratings and users who rated at least 20 out of the 40 movies. Note, IDA refers to initial data analysis. This is important component for all data analytics.
movielens <- read.csv("movielens_top40.csv", header = TRUE)
dim(movielens)
## [1] 40 153
print(movielens[1:5,1:5])
## user_1 user_6 user_7 user_15 user_17
## Aladdin (1992) NA 5 3.0 3 NA
## American Beauty (1999) 5 NA 4.0 4 4.0
## Apollo 13 (1995) NA 4 4.5 NA 3.5
## Back to the Future (1985) 5 NA 5.0 5 4.5
## Batman (1989) 4 3 3.0 NA 4.5
head(movielens)
## user_1 user_6 user_7 user_15 user_17 user_18 user_19
## Aladdin (1992) NA 5 3.0 3 NA 3.5 3
## American Beauty (1999) 5 NA 4.0 4 4.0 NA 4
## Apollo 13 (1995) NA 4 4.5 NA 3.5 NA NA
## Back to the Future (1985) 5 NA 5.0 5 4.5 4.0 4
## Batman (1989) 4 3 3.0 NA 4.5 NA 5
## Braveheart (1995) 4 5 NA NA 4.5 4.5 NA
## user_21 user_28 user_39 user_42 user_45 user_57
## Aladdin (1992) 4.0 NA 4 NA 5.0 4
## American Beauty (1999) 2.0 4.0 5 NA 5.0 5
## Apollo 13 (1995) NA NA NA 5 5.0 3
## Back to the Future (1985) 5.0 NA 4 4 3.5 4
## Batman (1989) 3.5 2.5 4 3 NA 4
## Braveheart (1995) NA 3.5 NA 4 5.0 4
## user_58 user_62 user_63 user_64 user_66 user_68
## Aladdin (1992) 5 NA 4.0 4.0 NA 3.5
## American Beauty (1999) NA NA 5.0 2.5 5 5.0
## Apollo 13 (1995) 4 NA 3.0 NA NA 3.0
## Back to the Future (1985) NA 4.5 5.0 NA 3 3.0
## Batman (1989) 3 NA 4.0 NA 4 4.0
## Braveheart (1995) 5 4.5 2.5 4.0 5 2.5
## user_72 user_82 user_84 user_86 user_91 user_96
## Aladdin (1992) NA 2.5 NA 4 3.5 NA
## American Beauty (1999) 4.5 NA NA 4 NA 5
## Apollo 13 (1995) 4.0 NA 5 NA 3.5 5
## Back to the Future (1985) 4.0 4.0 3 NA 3.5 NA
## Batman (1989) NA 3.5 3 NA 5.0 NA
## Braveheart (1995) 4.5 4.5 NA NA 4.0 5
## user_103 user_105 user_109 user_112 user_115 user_117
## Aladdin (1992) NA NA 3 NA 4 4
## American Beauty (1999) NA 5.0 NA NA 1 NA
## Apollo 13 (1995) 4.0 NA 3 4.0 NA 4
## Back to the Future (1985) NA NA NA 4.0 NA NA
## Batman (1989) NA NA 4 NA 5 3
## Braveheart (1995) 4.5 3.5 5 3.5 3 5
## user_122 user_132 user_135 user_137 user_140 user_141
## Aladdin (1992) NA 3.5 NA 4.0 3 4.0
## American Beauty (1999) NA 4.5 4 NA 4 NA
## Apollo 13 (1995) NA NA NA 3.5 5 3.5
## Back to the Future (1985) 5.0 3.5 NA 3.5 3 2.5
## Batman (1989) 4.5 2.0 5 NA NA NA
## Braveheart (1995) NA NA 4 4.0 4 3.5
## user_144 user_156 user_160 user_166 user_167 user_177
## Aladdin (1992) 4.5 NA NA 5.0 3.0 4
## American Beauty (1999) 4.0 4.5 5 4.0 3.0 4
## Apollo 13 (1995) 3.0 4.0 5 NA 4.0 4
## Back to the Future (1985) NA 3.5 5 NA NA 5
## Batman (1989) 3.5 NA 4 3.5 3.0 3
## Braveheart (1995) 4.5 NA 4 NA 3.5 NA
## user_178 user_179 user_182 user_186 user_187 user_195
## Aladdin (1992) NA NA NA 5 NA NA
## American Beauty (1999) 5.0 NA 5.0 NA 4 4
## Apollo 13 (1995) NA 4 2.5 NA NA 4
## Back to the Future (1985) 4.5 NA 3.0 NA NA 5
## Batman (1989) NA 3 3.5 4 NA NA
## Braveheart (1995) 4.0 5 3.5 NA 3 NA
## user_198 user_199 user_200 user_201 user_202 user_212
## Aladdin (1992) NA NA 4.0 NA 4 NA
## American Beauty (1999) 5 5 3.5 5 4 3.5
## Apollo 13 (1995) NA 4 4.0 4 4 NA
## Back to the Future (1985) 5 NA 4.0 5 4 NA
## Batman (1989) 3 3 NA 3 3 NA
## Braveheart (1995) 3 NA 4.5 NA 4 NA
## user_217 user_219 user_220 user_226 user_230 user_232
## Aladdin (1992) NA 4.5 5 4.0 2 3.0
## American Beauty (1999) NA 5.0 NA 4.0 NA NA
## Apollo 13 (1995) NA 4.0 5 4.5 2 4.5
## Back to the Future (1985) 3 3.5 5 4.0 NA 3.0
## Batman (1989) 2 3.5 NA NA 3 NA
## Braveheart (1995) 2 NA NA NA NA 4.5
## user_233 user_239 user_247 user_249 user_254 user_263
## Aladdin (1992) NA 4.0 5 4.0 NA NA
## American Beauty (1999) 3 5.0 4 4.5 5.0 4
## Apollo 13 (1995) 2 NA 3 2.5 4.0 4
## Back to the Future (1985) NA NA 4 4.5 3.5 NA
## Batman (1989) NA NA NA NA 2.5 NA
## Braveheart (1995) 3 4.5 4 5.0 4.0 4
## user_266 user_274 user_275 user_279 user_282 user_288
## Aladdin (1992) NA 4.0 NA 2.0 4.5 4
## American Beauty (1999) NA 5.0 4 3.5 4.5 NA
## Apollo 13 (1995) NA NA NA NA 4.5 3
## Back to the Future (1985) 4 3.5 4 3.5 5.0 5
## Batman (1989) 4 3.0 NA NA 3.5 3
## Braveheart (1995) 5 4.5 NA 4.0 NA 5
## user_292 user_298 user_304 user_305 user_307 user_308
## Aladdin (1992) 4.0 NA 4 NA 4.0 NA
## American Beauty (1999) NA 4.0 2 5.0 4.0 NA
## Apollo 13 (1995) NA NA 5 NA 2.0 NA
## Back to the Future (1985) 4.0 3.5 5 5.0 4.0 NA
## Batman (1989) 3.5 3.5 NA 2.5 4.0 NA
## Braveheart (1995) 2.5 3.0 5 NA 3.5 1
## user_313 user_314 user_317 user_318 user_322 user_328
## Aladdin (1992) NA 3 NA NA NA 3.5
## American Beauty (1999) 4 NA 5 3.5 4.5 NA
## Apollo 13 (1995) NA 4 3 NA 4.0 3.0
## Back to the Future (1985) 2 NA NA 2.5 NA 4.0
## Batman (1989) 5 3 NA NA NA 2.0
## Braveheart (1995) NA 4 5 NA 3.5 1.0
## user_330 user_332 user_334 user_339 user_352 user_354
## Aladdin (1992) 3.0 NA NA NA NA 3.5
## American Beauty (1999) 4.5 4.5 NA 5.0 5 4.0
## Apollo 13 (1995) 3.0 3.5 NA 4.0 NA 4.0
## Back to the Future (1985) 4.0 4.0 3.5 4.0 NA 4.0
## Batman (1989) 4.0 NA NA 2.5 NA 4.0
## Braveheart (1995) 3.5 3.5 NA NA NA NA
## user_357 user_362 user_368 user_370 user_372 user_376
## Aladdin (1992) 4.5 NA NA NA 4 NA
## American Beauty (1999) 3.5 NA 4 3.5 NA NA
## Apollo 13 (1995) 3.5 NA NA NA 3 5.0
## Back to the Future (1985) 4.0 NA NA NA 5 4.5
## Batman (1989) 3.0 4.5 3 4.0 3 NA
## Braveheart (1995) 4.0 4.0 4 NA 4 3.5
## user_380 user_381 user_382 user_385 user_387 user_391
## Aladdin (1992) 5 4.0 5 4 2.5 NA
## American Beauty (1999) NA NA NA NA 4.5 4
## Apollo 13 (1995) NA 3.5 4 5 NA 4
## Back to the Future (1985) 5 4.0 NA 4 2.0 4
## Batman (1989) 3 NA NA 3 4.0 4
## Braveheart (1995) 4 NA NA NA 3.5 5
## user_399 user_414 user_415 user_425 user_428 user_432
## Aladdin (1992) NA 4 4.0 3.0 2.0 NA
## American Beauty (1999) 0.5 5 3.5 3.0 3.5 3.5
## Apollo 13 (1995) NA 4 4.0 3.0 2.0 NA
## Back to the Future (1985) 5.0 5 NA NA NA NA
## Batman (1989) NA 4 NA 3.5 3.0 NA
## Braveheart (1995) 3.0 5 NA 4.0 2.5 4.0
## user_434 user_438 user_448 user_452 user_453 user_462
## Aladdin (1992) 4.0 4.0 NA NA 5 NA
## American Beauty (1999) 5.0 NA 4 4 5 3.5
## Apollo 13 (1995) 5.0 4.0 3 NA NA NA
## Back to the Future (1985) 3.5 4.0 5 4 NA 1.5
## Batman (1989) NA 4.0 3 5 NA 3.0
## Braveheart (1995) 4.5 4.5 NA 5 5 NA
## user_464 user_469 user_470 user_474 user_477 user_480
## Aladdin (1992) NA 2 3 4.0 3.0 4.0
## American Beauty (1999) 4 5 NA 3.5 4.5 4.0
## Apollo 13 (1995) NA NA 3 4.5 4.0 3.5
## Back to the Future (1985) NA 3 NA 4.5 4.5 5.0
## Batman (1989) NA 3 3 4.0 NA 4.5
## Braveheart (1995) 5 5 5 3.0 NA 5.0
## user_483 user_489 user_514 user_517 user_522 user_524
## Aladdin (1992) 4.0 3.5 4.0 3.0 4 4
## American Beauty (1999) 4.0 4.0 4.0 1.0 5 NA
## Apollo 13 (1995) 2.0 3.5 4.0 NA NA 5
## Back to the Future (1985) 4.5 3.5 5.0 5.0 5 5
## Batman (1989) 3.5 4.0 2.5 3.0 NA 3
## Braveheart (1995) 4.0 4.5 NA 1.5 4 3
## user_525 user_534 user_551 user_555 user_559 user_560
## Aladdin (1992) 3.5 4.5 NA NA 4 NA
## American Beauty (1999) 4.0 3.5 NA 5 NA 4
## Apollo 13 (1995) 4.0 NA NA 4 3 4
## Back to the Future (1985) 4.0 5.0 4.0 3 NA NA
## Batman (1989) NA 4.0 NA 3 3 NA
## Braveheart (1995) NA NA 3.5 5 4 4
## user_561 user_562 user_570 user_573 user_577 user_580
## Aladdin (1992) NA 4 NA 4.5 NA 2.0
## American Beauty (1999) 3.5 5 4.0 2.0 NA 5.0
## Apollo 13 (1995) NA 3 4.0 3.0 NA NA
## Back to the Future (1985) 4.5 NA 4.0 4.5 5 3.5
## Batman (1989) 4.5 NA NA 4.5 2 3.0
## Braveheart (1995) 5.0 4 3.5 5.0 4 4.5
## user_586 user_590 user_593 user_594 user_596 user_597
## Aladdin (1992) 4.5 4.0 3.5 4.5 NA 4
## American Beauty (1999) NA 3.0 4.5 NA NA 5
## Apollo 13 (1995) NA 4.5 3.0 3.5 3.5 NA
## Back to the Future (1985) 4.5 4.5 NA NA 4.0 5
## Batman (1989) NA 3.5 NA 4.5 3.5 4
## Braveheart (1995) 5.0 4.0 3.0 5.0 NA 5
## user_599 user_600 user_602 user_603 user_606 user_607
## Aladdin (1992) 3.0 3.5 NA NA NA NA
## American Beauty (1999) 5.0 4.5 NA 5 4.5 3
## Apollo 13 (1995) 2.5 2.0 4 NA NA 5
## Back to the Future (1985) 3.5 4.5 NA 2 3.5 3
## Batman (1989) 3.5 2.5 4 2 3.5 3
## Braveheart (1995) 3.5 2.0 5 1 3.5 5
## user_608 user_610
## Aladdin (1992) 3 NA
## American Beauty (1999) 5 3.5
## Apollo 13 (1995) 2 NA
## Back to the Future (1985) 2 5.0
## Batman (1989) 3 4.5
## Braveheart (1995) 4 4.5
In this case, the data is structured in the opposite of a typical data layout. whereby the variables of interest are the movies and they appear on the rows and the user response values appear as the columns. This is done somewhat intentionally for the distance calculations coming soon that computes the pairwise distances, where the pairing is done by row.
Given the large amount of variables, a natural high-dimensional visualization method is to cluster the movies based on different user ratings. We will look at how to do this in R.
hclust usagePerform hierarchical clustering using the hclust() function and plot the resulting dendrogram. Try it with the average, complete and single methods.
d <- dist(movielens)
h <- hclust(d)
plot(h, cex = 0.4)
h_avg <- hclust(d, method = "average")
plot(h_avg, cex = 0.4)
h_single <- hclust(d, method = "single")
plot(h_single, cex = 0.4)
hclustUse the cutree() function on the output of hclust() (with default settings) to separate the movie titles into four clusters. Can you extract the movies in cluster 1? We can also cut the tree by defining a height at which the tree should be cut. Can you find the value of h to cut the tree into four clusters?
movie_groups <- cutree(h, k = 4)
head(movie_groups)
## Aladdin (1992) American Beauty (1999) Apollo 13 (1995)
## 1 2 3
## Back to the Future (1985) Batman (1989) Braveheart (1995)
## 1 3 2
split(names(movie_groups), movie_groups)
## $`1`
## [1] "Aladdin (1992)" "Back to the Future (1985)"
## [3] "Lion King, The (1994)" "Shrek (2001)"
## [5] "Toy Story (1995)"
##
## $`2`
## [1] "American Beauty (1999)"
## [2] "Braveheart (1995)"
## [3] "Fargo (1996)"
## [4] "Fight Club (1999)"
## [5] "Forrest Gump (1994)"
## [6] "Godfather, The (1972)"
## [7] "Lord of the Rings: The Fellowship of the Ring, The (2001)"
## [8] "Lord of the Rings: The Return of the King, The (2003)"
## [9] "Lord of the Rings: The Two Towers, The (2002)"
## [10] "Matrix, The (1999)"
## [11] "Pulp Fiction (1994)"
## [12] "Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)"
## [13] "Saving Private Ryan (1998)"
## [14] "Schindler's List (1993)"
## [15] "Seven (a.k.a. Se7en) (1995)"
## [16] "Shawshank Redemption, The (1994)"
## [17] "Silence of the Lambs, The (1991)"
## [18] "Star Wars: Episode IV - A New Hope (1977)"
## [19] "Star Wars: Episode V - The Empire Strikes Back (1980)"
## [20] "Star Wars: Episode VI - Return of the Jedi (1983)"
## [21] "Terminator 2: Judgment Day (1991)"
## [22] "Usual Suspects, The (1995)"
##
## $`3`
## [1] "Apollo 13 (1995)"
## [2] "Batman (1989)"
## [3] "Dances with Wolves (1990)"
## [4] "Gladiator (2000)"
## [5] "Sixth Sense, The (1999)"
## [6] "Twelve Monkeys (a.k.a. 12 Monkeys) (1995)"
##
## $`4`
## [1] "Fugitive, The (1993)"
## [2] "Independence Day (a.k.a. ID4) (1996)"
## [3] "Jurassic Park (1993)"
## [4] "Men in Black (a.k.a. MIB) (1997)"
## [5] "Mission: Impossible (1996)"
## [6] "Speed (1994)"
## [7] "True Lies (1994)"
table(movie_groups)
## movie_groups
## 1 2 3 4
## 5 22 6 7
cutree(h, h = 16)
## Aladdin (1992)
## 1
## American Beauty (1999)
## 2
## Apollo 13 (1995)
## 3
## Back to the Future (1985)
## 1
## Batman (1989)
## 3
## Braveheart (1995)
## 2
## Dances with Wolves (1990)
## 3
## Fargo (1996)
## 2
## Fight Club (1999)
## 2
## Forrest Gump (1994)
## 2
## Fugitive, The (1993)
## 4
## Gladiator (2000)
## 3
## Godfather, The (1972)
## 2
## Independence Day (a.k.a. ID4) (1996)
## 4
## Jurassic Park (1993)
## 4
## Lion King, The (1994)
## 1
## Lord of the Rings: The Fellowship of the Ring, The (2001)
## 2
## Lord of the Rings: The Return of the King, The (2003)
## 2
## Lord of the Rings: The Two Towers, The (2002)
## 2
## Matrix, The (1999)
## 2
## Men in Black (a.k.a. MIB) (1997)
## 4
## Mission: Impossible (1996)
## 4
## Pulp Fiction (1994)
## 2
## Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
## 2
## Saving Private Ryan (1998)
## 2
## Schindler's List (1993)
## 2
## Seven (a.k.a. Se7en) (1995)
## 2
## Shawshank Redemption, The (1994)
## 2
## Shrek (2001)
## 1
## Silence of the Lambs, The (1991)
## 2
## Sixth Sense, The (1999)
## 3
## Speed (1994)
## 4
## Star Wars: Episode IV - A New Hope (1977)
## 2
## Star Wars: Episode V - The Empire Strikes Back (1980)
## 2
## Star Wars: Episode VI - Return of the Jedi (1983)
## 2
## Terminator 2: Judgment Day (1991)
## 2
## Toy Story (1995)
## 1
## True Lies (1994)
## 4
## Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
## 3
## Usual Suspects, The (1995)
## 2
cutree to find 4 clusters and compare to your result in the previous question.movielens_mat <- as.matrix(movielens)
movielens_mat[is.na(movielens_mat)] <- 0
movielens_mat[movielens_mat > 0] <- 1
d_man <- dist(movielens_mat, method = "manhattan")
h_man <- hclust(d_man)
plot(h_man, cex = 0.5)
movie_groups_man <- cutree(h_man, k = 4)
tanglegram(as.dendrogram(h_man), as.dendrogram(h))
## Visulize the data [Optional]
R also offers a number of packages that enable the user to visualize the data together with the clustering tree. We call these visualizations “heatmaps” of the data matrix. Download and install the package ComplexHeatmap using the code provided below and we will need to ensure the input is a matrix as expected by the function Heatmap. The arguments row_names_gp and column_names_gp enable us to reduce the font size.
# BiocManager::install("ComplexHeatmap")
# BiocManager::install("shape")
library(ComplexHeatmap)
movielens_matrix <- as.matrix(movielens)
movielens_matrix <- as.matrix(movielens)
library(ComplexHeatmap)
movielens_matrix <- as.matrix(movielens)
Heatmap(movielens_matrix,
row_names_gp = gpar(fontsize = 7),
column_names_gp = gpar(fontsize = 7))
Suppose we like to compare the effect of two trees and visualize it. R has a package called dendextend that compare two dendrograms, it has the following key functions - untangle(): finds alignment, - tanglegram(): visualise the two dendrograms, - entanglement(): computes the quality of the alignment.
library(dendextend)
# Create two dendrograms
h_avg <- hclust(d, method = "average")
h_single <- hclust(d, method = "single")
dend1 <- as.dendrogram (h_avg)
dend2 <- as.dendrogram (h_single)
# Create a list to hold dendrograms
dend_list <- dendlist(dend1, dend2)
# Compare the two trees
tanglegram(dend_list)
Next, let’s explore the kmeans method. Go back to the original movies dataset with ratings between 1 to 5 and missing values, let’s now make a new dataset replacing all the NAs with 0 but keep the ratings. We are doing this because the kmeans function cannot handle missing values. In a later module, we will look at how to handle missing values. Use kmeans to cluster the movies into four clusters. How many movies are in each cluster?
movielens_mat <- as.matrix(movielens)
movielens_mat[is.na(movielens_mat)] <- 0
kmeans_res <- kmeans(movielens_mat, centers = 4)
table(kmeans_res$cluster)
##
## 1 2 3 4
## 14 6 16 4
kmeans clustering, use a dimension reduction technique such as PCA.movie_pc = prcomp(movielens_mat, scale = TRUE)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
movie.df = data.frame(PC1 = movie_pc$x[,1], PC2 = movie_pc$x[,2],
labels = factor(kmeans_res$cluster))
ggplot(movie.df, aes(PC1, PC2, colour = labels)) + geom_point() + theme_minimal()
Let’s now look at the cluster statistics. Can you plot the total within group sum of squares for k = 2, 3, 4, 5, 6 from kmeans(). The tot.withinss is part of the output value of kmeans. Repeat for between group sum of squares (betweenss). Do the plots hint at what is the best k?
set.seed(5003)
kmeans_2 <- kmeans(movielens_mat, centers = 2)
kmeans_3 <- kmeans(movielens_mat, centers = 3)
kmeans_4 <- kmeans(movielens_mat, centers = 4)
kmeans_5 <- kmeans(movielens_mat, centers = 5)
kmeans_6 <- kmeans(movielens_mat, centers = 6)
tot.withinss <- c(kmeans_2$tot.withinss, kmeans_3$tot.withinss,
kmeans_4$tot.withinss, kmeans_5$tot.withinss,
kmeans_6$tot.withinss)
betweenss <- c(kmeans_2$betweenss, kmeans_3$betweenss,
kmeans_4$betweenss, kmeans_5$betweenss,
kmeans_6$betweenss)
# or more directly using the apply suite over a larger range
set.seed(5003)
center.seq <- 2:39
kmeans <- lapply(center.seq, function(x) kmeans(movielens_mat, centers = x))
tot.within.ss <- sapply(kmeans, "[[", "tot.withinss")
between.ss <- sapply(kmeans, "[[", "betweenss")
plot(center.seq, tot.within.ss,
xlab = "Number of clusters", main = "Within group SS",
type = 'l')
Notice the total within sum of squares decreasing monotonically as the number of clusters increases
plot(center.seq, between.ss,
xlab = "Number of clusters", main = "Between group SS",
type = 'l')
Also the between sum of squares increases as the number of clusters increases
The appropriate metric to consider for assessing the appropriateness of the number of clusters is the ratio of within to between SS
plot(center.seq, between.ss/tot.within.ss, xlab = "Number of clusters",
main = "Ratio of Betweeen / Within SS", type = 'l')
Create a shiny app for the author_count data which gives the user options to decide which visualization technique to use and calibrate it with any necessary parameters .